Tag
1 article
Researchers at UC San Diego introduce DFlash, a new speculative decoding technique that drafts whole token blocks in parallel, achieving up to 15x throughput improvement on NVIDIA Blackwell.